Machine Learning Analysis Report

Generated on August 03, 2025 at 09:11 PM

Machine Learning Analysis Pipeline

EDR: Dataset Loading & Preprocessing

EDR – Train/Test Overview
• Train shape: (9561, 20) | Test shape: (818, 20)
• Total train samples: 9,561 | Total test samples: 818
• Number of features: 16
• Target column: 'label'
• Missing values (train): 0 | (test): 0
EDR – Train Class Distribution
• 0: 8,704
• 1: 857
• Class balance (minority/majority): 9.8460%
EDR – Feature Preparation
• Target encoding: {0: 0, 1: 1}
• Data preprocessing: Infinite values handled, missing values filled with train medians
• Feature scaling: StandardScaler (fit on train, applied to test)
Baseline (Most-Frequent) Accuracy: 0.9095

EDR: Model Performance Comparison

EDR – Model Performance Metrics

ModelAccuracyBalanced AccPrecisionRecallF1ROC-AUCPR-AUC
Logistic Regression0.86310.63270.28890.35140.31710.61220.2094
Random Forest (SMOTE)0.80070.67750.23350.52700.32370.82860.2972
LightGBM0.79950.68900.23840.55410.33330.84310.3639
Balanced RF0.84470.68340.28800.48650.36180.84470.3575
SGD SVM0.87530.57250.25860.20270.2273nannan
IsolationForest0.84470.54960.17280.18920.1806nannan

Confusion Matrix Analysis

ModelTNFPFNTPFP RateMiss Rate
Logistic Regression6806448268.60%64.86%
Random Forest (SMOTE)616128353917.20%47.30%
LightGBM613131334117.61%44.59%
Balanced RF65589383611.96%51.35%
SGD SVM7014359155.78%79.73%
IsolationForest6776760149.01%81.08%

Best Models by Metric

Accuracy
SGD SVM
0.8753
Balanced Acc
LightGBM
0.6890
Precision
Logistic Regression
0.2889
Recall
LightGBM
0.5541
F1
Balanced RF
0.3618
ROC-AUC
Balanced RF
0.8447
PR-AUC
LightGBM
0.3639
Lowest False Positive Rate
SGD SVM
5.78%
Lowest Miss Rate
LightGBM
44.59%

EDR – Metrics by Model

EDR – Metrics by Model

EDR – ROC Curves

EDR – ROC Curves

EDR – Precision–Recall Curves

EDR – Precision–Recall Curves

EDR – Predicted Probability Distributions

EDR – Predicted Probability Distributions

EDR – Threshold Sweep

EDR – Threshold Sweep

EDR: Logistic Regression – Detailed Analysis

EDR – Logistic Regression: Confusion Matrix

EDR – Logistic Regression: Confusion Matrix

EDR – Logistic Regression: Confusion Matrix

EDR – Logistic Regression: Classification Report

Modelprecisionrecallf1support
00.93410.91400.9239744.0000
10.28890.35140.317174.0000
accuracynannan0.8631818.0000

EDR – Logistic Regression: Feature Importance

EDR – Logistic Regression: Feature Importance

EDR – Logistic Regression: Feature Importance

EDR: Random Forest (SMOTE) – Detailed Analysis

EDR – Random Forest (SMOTE): Confusion Matrix

EDR – Random Forest (SMOTE): Confusion Matrix

EDR – Random Forest (SMOTE): Confusion Matrix

EDR – Random Forest (SMOTE): Classification Report

Modelprecisionrecallf1support
00.94620.82800.8832744.0000
10.23350.52700.323774.0000
accuracynannan0.8007818.0000

EDR – Random Forest (SMOTE): Feature Importance

EDR – Random Forest (SMOTE): Feature Importance

EDR – Random Forest (SMOTE): Feature Importance

EDR: LightGBM – Detailed Analysis

EDR – LightGBM: Confusion Matrix

EDR – LightGBM: Confusion Matrix

EDR – LightGBM: Confusion Matrix

EDR – LightGBM: Classification Report

Modelprecisionrecallf1support
00.94890.82390.8820744.0000
10.23840.55410.333374.0000
accuracynannan0.7995818.0000

EDR – LightGBM: Feature Importance

EDR – LightGBM: Feature Importance

EDR – LightGBM: Feature Importance

EDR: Balanced RF – Detailed Analysis

EDR – Balanced RF: Confusion Matrix

EDR – Balanced RF: Confusion Matrix

EDR – Balanced RF: Confusion Matrix

EDR – Balanced RF: Classification Report

Modelprecisionrecallf1support
00.94520.88040.9116744.0000
10.28800.48650.361874.0000
accuracynannan0.8447818.0000

EDR – Balanced RF: Feature Importance

EDR – Balanced RF: Feature Importance

EDR – Balanced RF: Feature Importance

EDR: SGD SVM – Detailed Analysis

EDR – SGD SVM: Confusion Matrix

EDR – SGD SVM: Confusion Matrix

EDR – SGD SVM: Confusion Matrix

EDR – SGD SVM: Classification Report

Modelprecisionrecallf1support
00.92240.94220.9322744.0000
10.25860.20270.227374.0000
accuracynannan0.8753818.0000

EDR – SGD SVM: Feature Importance

EDR – SGD SVM: Feature Importance

EDR – SGD SVM: Feature Importance

EDR: IsolationForest – Detailed Analysis

EDR – IsolationForest: Confusion Matrix

EDR – IsolationForest: Confusion Matrix

EDR – IsolationForest: Confusion Matrix

EDR – IsolationForest: Classification Report

Modelprecisionrecallf1support
00.91860.90990.9142744.0000
10.17280.18920.180674.0000
accuracynannan0.8447818.0000

EDR – IsolationForest: Feature Importance

Feature importance not available for this model type.

XDR: Dataset Loading & Preprocessing

XDR – Train/Test Overview
• Train shape: (9561, 34) | Test shape: (818, 34)
• Total train samples: 9,561 | Total test samples: 818
• Number of features: 30
• Target column: 'label'
• Missing values (train): 0 | (test): 0
XDR – Train Class Distribution
• 0: 8,704
• 1: 857
• Class balance (minority/majority): 9.8460%
XDR – Feature Preparation
• Target encoding: {0: 0, 1: 1}
• Data preprocessing: Infinite values handled, missing values filled with train medians
• Feature scaling: StandardScaler (fit on train, applied to test)
Baseline (Most-Frequent) Accuracy: 0.9095

XDR: Model Performance Comparison

XDR – Model Performance Metrics

ModelAccuracyBalanced AccPrecisionRecallF1ROC-AUCPR-AUC
Logistic Regression0.84840.58200.21590.25680.23460.59560.1997
Random Forest (SMOTE)0.87780.64680.33750.36490.35060.82950.3570
LightGBM0.87780.68330.35870.44590.39760.86210.3801
Balanced RF0.86190.68070.31780.45950.37570.84370.3726
SGD SVM0.73230.51820.10380.25680.1479nannan
IsolationForest0.88510.54740.25000.13510.1754nannan

Confusion Matrix Analysis

ModelTNFPFNTPFP RateMiss Rate
Logistic Regression6756955199.27%74.32%
Random Forest (SMOTE)6915347277.12%63.51%
LightGBM6855941337.93%55.41%
Balanced RF6717340349.81%54.05%
SGD SVM580164551922.04%74.32%
IsolationForest7143064104.03%86.49%

Best Models by Metric

Accuracy
IsolationForest
0.8851
Balanced Acc
LightGBM
0.6833
Precision
LightGBM
0.3587
Recall
Balanced RF
0.4595
F1
LightGBM
0.3976
ROC-AUC
LightGBM
0.8621
PR-AUC
LightGBM
0.3801
Lowest False Positive Rate
IsolationForest
4.03%
Lowest Miss Rate
Balanced RF
54.05%

XDR – Metrics by Model

XDR – Metrics by Model

XDR – ROC Curves

XDR – ROC Curves

XDR – Precision–Recall Curves

XDR – Precision–Recall Curves

XDR – Predicted Probability Distributions

XDR – Predicted Probability Distributions

XDR – Threshold Sweep

XDR – Threshold Sweep

XDR: Logistic Regression – Detailed Analysis

XDR – Logistic Regression: Confusion Matrix

XDR – Logistic Regression: Confusion Matrix

XDR – Logistic Regression: Confusion Matrix

XDR – Logistic Regression: Classification Report

Modelprecisionrecallf1support
00.92470.90730.9159744.0000
10.21590.25680.234674.0000
accuracynannan0.8484818.0000

XDR – Logistic Regression: Feature Importance

XDR – Logistic Regression: Feature Importance

XDR – Logistic Regression: Feature Importance

XDR: Random Forest (SMOTE) – Detailed Analysis

XDR – Random Forest (SMOTE): Confusion Matrix

XDR – Random Forest (SMOTE): Confusion Matrix

XDR – Random Forest (SMOTE): Confusion Matrix

XDR – Random Forest (SMOTE): Classification Report

Modelprecisionrecallf1support
00.93630.92880.9325744.0000
10.33750.36490.350674.0000
accuracynannan0.8778818.0000

XDR – Random Forest (SMOTE): Feature Importance

XDR – Random Forest (SMOTE): Feature Importance

XDR – Random Forest (SMOTE): Feature Importance

XDR: LightGBM – Detailed Analysis

XDR – LightGBM: Confusion Matrix

XDR – LightGBM: Confusion Matrix

XDR – LightGBM: Confusion Matrix

XDR – LightGBM: Classification Report

Modelprecisionrecallf1support
00.94350.92070.9320744.0000
10.35870.44590.397674.0000
accuracynannan0.8778818.0000

XDR – LightGBM: Feature Importance

XDR – LightGBM: Feature Importance

XDR – LightGBM: Feature Importance

XDR: Balanced RF – Detailed Analysis

XDR – Balanced RF: Confusion Matrix

XDR – Balanced RF: Confusion Matrix

XDR – Balanced RF: Confusion Matrix

XDR – Balanced RF: Classification Report

Modelprecisionrecallf1support
00.94370.90190.9223744.0000
10.31780.45950.375774.0000
accuracynannan0.8619818.0000

XDR – Balanced RF: Feature Importance

XDR – Balanced RF: Feature Importance

XDR – Balanced RF: Feature Importance

XDR: SGD SVM – Detailed Analysis

XDR – SGD SVM: Confusion Matrix

XDR – SGD SVM: Confusion Matrix

XDR – SGD SVM: Confusion Matrix

XDR – SGD SVM: Classification Report

Modelprecisionrecallf1support
00.91340.77960.8412744.0000
10.10380.25680.147974.0000
accuracynannan0.7323818.0000

XDR – SGD SVM: Feature Importance

XDR – SGD SVM: Feature Importance

XDR – SGD SVM: Feature Importance

XDR: IsolationForest – Detailed Analysis

XDR – IsolationForest: Confusion Matrix

XDR – IsolationForest: Confusion Matrix

XDR – IsolationForest: Confusion Matrix

XDR – IsolationForest: Classification Report

Modelprecisionrecallf1support
00.91770.95970.9382744.0000
10.25000.13510.175474.0000
accuracynannan0.8851818.0000

XDR – IsolationForest: Feature Importance

Feature importance not available for this model type.